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SUMMARY 


High-end graphics workstations are becoming a 
necessary tool in the computational fluid dynamics 
environment. In addition to their graphics capabili- 
ties, workstations of the latest generation have pow- 
erful floating-point-operation capabilities. As worksta- 
tions become common, they could provide valuable 
computing time for such applications as turbomachin- 
ery flow calculations. This report discusses the is- 
sues involved in implementing an unsteady, viscous 
multistage-turbomachinery code (STAGE-2) on work- 
stations. It then describes work in which the worksta- 
tion version of STAGE-2 was used to study the effects 
of axial-gap spacing on the time-averaged and unsteady 
flow within a 2 i -stage compressor. The results include 
time-averaged surface pressures, time-averaged pres- 
sure contours, standard deviation of pressure contours, 
pressure amplitudes, and force polar plots. 
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INTRODUCTION 

Flows in turbomachines are difficult to analyze 
because of the time-varying geometries and inherently 
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unsteady flow. Experimental techniques exist to in- 
vestigate the time-averaged and unsteady flow within 
turbomachines, but they can be expensive to use. Be- 
cause of this, analytical techniques have been used to 
supplement the knowledge gained from experimenta- 
tion. As computer resources became available, com- 
putational techniques were also used to supplement 
knowledge of turbomachinery flows. In these earlier 
works, various levels of approximation were applied 
to make turbomachinery flow computations tractable 
on the available computers. Unfortunately, these ap- 
proximations also restricted the usefulness of the com- 
putational model and the information it generated. 
Only recently have two- and three-dimensional un- 
steady viscous-flow computations been possible; how- 
ever, these unsteady analyses have been considered im- 
practical for routine design purposes because of their 
memory usage, run times, and dependence on super- 
computer technology. Improvements in computer tech- 
nology are rapidly making these computations practi- 
cal on a range of computers from supercomputers to 
single-user workstations. 

Supercomputers are expensive to buy, maintain, 
and upgrade. Because of this, they tend not to be re- 
placed or upgraded until they are seriously overutilized. 
This leads to long job queues and slow turnaround 
times on jobs. Raw computer speed is irrelevant if 
jobs are unable to get through the system in a reason- 
able amount of time. To the researcher, the wall-clock 
time is often more critical than the cpu time required 
for convergence. 

Even on an unloaded supercomputer, job account- 
ing procedures can limit the amount of cpu time avail- 
able to an individual. Typically, an individual is allo- 
cated a certain amount of time or is charged for time 
used. In either case, supercomputer cpu usage has to 
be carefully budgeted, and other sources of cpu time 
must be found. A reasonable compromise to these 
constraints has been provided by the latest generation 
of workstations. A dedicated workstation can provide 
wall-clock time performance on the order of that of a 
heavily loaded supercomputer at a comparatively low 
cost to the researcher. This report discusses the issues 
involved in implementing a two-dimensional, unsteady, 
viscous, multistage turbomachinery code (STAGE-2) 
on workstations. 

Results from STAGE-2 were compared with ex- 
perimental data for a single-stage turbine and a 21- 
stage compressor in Gundy-Burlet, Rai, and Dring 



(1989) and Gundy-Burlct ct al. (1990). In the cur- 
rent study, STAGE-2 was used to examine the effect 
of axial gap on the unsteady flow in a 2 5 -stage com- 
pressor. The axial gaps used in this study were 20%, 
35 %, and 50% of the average axial chord in the com- 
pressor. Hie time average and standard deviation of the 
pressure field were used to investigate steady and un- 
steady flow features. In addition, surface pressures and 
force polar plots were examined. Coarse-grid results 
were obtained on workstations; fine-grid results were 
obtained on both supercomputers and workstations. 

The author would like to thank Marcel Burlet, 
Dan Magenheimer, and Tun Bailey of Hewlett-Packard 
for their help in obtaining timings on Hewlett-Packard 
workstations. In addition, the author would like to 
thank Dr. A. Sugavanam, Sundar Raman, and Gwen 
Swan of IBM for providing both timings and compu- 
tational time on IBM workstations. 

ALGORITHM 

The current work is based on an extension of an 
approach developed by Rai and the approach is dis- 
cussed in detail in Rai (1987) and Rai and Madavan 
(1990). The approach is reviewed in brief here. The 
flow field is divided into two basic types of zones. In- 
ner “O” grids are used to resolve the flow field near 
the airfoils. These “O” grids arc overlaid on outer 
“H” grids, which are used to resolve the flow field in 
the passages between airfoils. The “H” grids arc al- 
lowed to slip relative to one another to simulate the 
relative motion between rotors and stators. Thin-layer 
Navier-Stokes equations are solved in the inner zones, 
where viscous effects are important, and Euler equa- 
tions are used in the outer zones, where viscous ef- 
fects are weak. The governing equations are cast in 
the strong conservation form. A fully implicit finite- 
difference method is used to advance the solution of the 
governing equations in time, and a Newton-Raphson 
subiteration scheme is used to reduce the lineariza- 
tion and factorization errors at each time step. The 
convective terms are evaluated using a third-order- 
accurate upwind-biased Roe scheme, and the viscous 
terms are evaluated using second-order-accurate central 
differences. The Baldwin-Lomax (1978) turbulence 
model is used to compute the turbulent eddy viscosity. 
Details of the turbulence model, zonal and natu- 
ral boundary conditions, grid configuration, book- 


keeping system, and database management systems are 
discussed in Gundy-Burlet, Rai, and Dring (1989). 

COMPUTER ARCHITECTURE ISSUES 

Efficient implementation of a computer code re- 
quires knowledge of the computer on which it will 
operate. There are several basic differences between 
the architectures of supercomputers and workstations 
that require implementation changes in the code. One 
major difference between supercomputers and work- 
stations is the size of their main memories. For in- 
stance, the NASA Ames CRAY-2 has an internal mem- 
ory of 2 Gbytes, and a typical workstation has between 
8 Mbytes and 64 Mbytes of memory. Extended mem- 
ory, such as disk or Solid State Disk (SSD), is also 
available on a supercomputer. Virtual-memory work- 
stations can also access disk when main memory is 
used up. 

The speed of transfer of information between 
main memory and extended memory is another ma- 
jor difference between workstations and supercomput- 
ers. Supercomputers have high-bandwidth channels 
between main memory and disk or SSD that provide 
fast input/output (I/O). Software support for unblocked, 
random -access I/O is also usually provided in FOR- 
TRAN on supercomputers. This is an efficient mecha- 
nism to output data to disk or SSD. Workstations do not 
support high-bandwidth channels between main mem- 
ory and disk. If the disk is used to supplement main 
memory on a virtual-memory machine, the “swapping” 
of data between disk and main memory can be ex- 
tremely slow. In addition, it is difficult to perform 
efficient I/O from FORTRAN on a workstation. 

A third difference between workstations and the 
CRAY in particular is that the CRAY is a vector pro- 
cessor and the workstation is a scalar processor. The 
large memory combined with the vector capabilities of 
the CRAY provides the opportunity to trade memory 
usage for speed. For instance, although block tridiago- 
nal inversions have data dependencies that inhibit vec- 
torization, they can be vectorized by processing sev- 
eral inversions simultaneously. This requires additional 
memory usage to store the block elements for each in- 
version, but can dramatically accelerate a code. Since 
current workstations are scalar processors, this vector- 
ization strategy would not speed up the code on a work- 
station. If the additional memory usage required that 
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virtual memory (disk) be accessed, the wall-clock time 
used could actually increase by an order of magnitude. 

With these factors in mind, the workstation ver- 
sion of the STAGE-2 code was designed to store data 
in internal buffer arrays. These arrays are efficiently 
packed to reduce memory usage and to minimize or 
eliminate the use of virtual memory. In addition, many 
arrays that are used to enhance vector processing are 
eliminated in the workstation version of the code. Ap- 
proximately 120 bytes of memory are required per grid 
point for the workstation version of STAGE-2. 

GEOMETRY AND GRID 

The 2 i -stage compressor geometry used in this 
study models the midspan geometry of an experiment 
that is part of the AGARD (1989) collection of test 
cases for computations of internal flows in aero-engine 
components. Much of the data for this compressor is 
also tabulated in Dring and Joslyn (1985). The experi- 
mental configuration consists of an inlet guide vane fol- 
lowed by two rotor/stator pairs. There are 44 airfoils in 
each row, leading to a 1:1 ratio of airfoils from row to 
row down the compressor. As it would be prohibitively 
expensive to compute the flow through the entire 
220-airfoil system, the flow is computed through only 
one passage, and periodicity is used to model the other 
43 passages. The axial gaps between airfoil rows in the 
experimental configuration are approximately 50% of 
the average axial chord. In this study the flow through 
the compressor is computed with the same midspan 
airfoil geometry, but with varying axial gaps. 

In Gundy-Burlet, Rai, and Dring (1989) and 
Gundy-Buriet et al. (1990), a parabolic-arc inlet guide 
vane was used because the actual vane geometry was 
unavailable. The vane geometry has recently become 
available and is used in the current calculation. The 
first and second stages of the compressor are similar, 
except that the first-stage rotor is closed 3° from axial 
relative to the second-stage rotor. This reduces the an- 
gle of attack of the first-stage rotor. The airfoil sections 
are all defined by NACA 65-series airfoils imposed on 
a circular-arc mean camber line. The average chord is 
4 in. 

A zonal grid system is used to discretize the flow 
field within the 2 f -stage compressor. Figure 1 shows 
the zonal grid system used for the 20%-gap case. For 
clarity in figure 1, every other point in the grid has 
been plotted. There are two grids associated with each 


airfoil: an inner, body-centered “O” grid and an outer, 
sheared, Cartesian “H" grid. The thin-layer Navier- 
Stokes equations are solved on the inner grids, whose 
grid points are clustered near the airfoil to resolve the 
viscous terms, and the Euler equations are solved on 
the outer grids. The rotor and stator grids are allowed 
to slip past each other to simulate the relative motion 
between rotor and stator airfoils. In addition to the 
two grids used for each airfoil, there are an inlet and 
an exit grid, thus yielding a total of 12 grids. 

In order to generate inner grids that are wholly 
contained by the outer grids but not distorted, it is 
necessary to overlap the rotor and stator outer grids in 
the gap regions for the 20%-axial-gap case. This can 
be seen in the 20%-axial-gap grid shown in figure 1. 
This required a modification of the grid generator and 
algorithm, and permited study of turbomachines with 
small axial gaps. 

Coarse grids are used to validate workstation re- 
sults. The inner grids are dimensioned 151 x 31. The 
outer grids have a varying number of points in the axial 
direction, but they all have 61 points in the circumfer- 
ential direction. The inlet and outlet grids have 28 and 
30 points in the axial direction, respectively. The outer 
grids associated with an airfoil average 77 points in the 
axial direction. This leads to a total of 50,367 points 
for all zones in the coarse-grid configuration. 

Fine grids are used to obtain detailed data regard- 
ing the steady and unsteady flow structure in the com- 
pressor. The inner grids are dimensioned 214 x 44. 
The outer grids have a varying number of points in the 
axial direction because of the change in axial gap and 
axial extent of each airfoil, but they all have 87 points 
in the circumferential directioa The inlet and out- 
let grids have 40 and 42 points in the axial direction, 
respectively. The outer grids associated with an air- 
foil average 99 points in the axial direction for the 
20%-gap case, 101 points for the 35%-gap case, and 
110 points for the 50%-gap case. This leads to a total 
of 97,279 points for all zones in the fine-grid config- 
uration for the 20%-gap case, 98,323 points for the 
35%-gap case, and 102,064 points for the 50%-gap 
case. 

RESULTS 

The results reported in this section are for the 
2 \ -stage compressor described above. These results 
were all computed at an inlet Mach number of 0.07, 
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an inlet Reynolds number of 100,000 per in., and 
a pressure rise of C v - 1.11. Several approxima- 
tions should be considered when interpreting the fol- 
lowing results. The flow in the compressor is three- 
dimensional with end-wall boundary layer growth, hub 
comer stall, and tip leakage effects. Because STAGE- 
2 is a two-dimensional code, it is unable to compute 
these three-dimensional effects. Stream-tube contrac- 
tion terms have not been implemented in the code, so 
the effect of the end-wall boundary layer growth is not 
modeled. 

For the coarse-grid computation, 2 subiterations 
per time step and 500 time steps per cycle are suffi- 
cient to provide stability and yet eliminate transients 
from the solution. A cycle is defined as the time it 
takes a rotor to move from its position relative to one 
stator to the corresponding position relative to the next 
stator. The code was benchmarked on several different 
workstations for a 50,367-point grid and for 2 subit- 
erations per time step. Five hundred time steps per 
cycle and 2 subiterations per time step were found to 
be sufficient for converging this coarse-grid 2 j -stage- 
compressor calculation. Table 1 gives the code’s per- 
formance on several workstations, which range in price 
from $10,000 to $100,000. It is not the purpose of 
this study to present price/performance comparisons 
between different workstations. Instead, it is meant 
to show that STAGE-2 will operate on a wide vari- 
ety of workstations and to give a general idea of its 
performance on these workstations. The performance 
is measured both by cpu time per iteration per grid 
point (cpu/it/pt) and by the MFLOP rate. The MFLOP 
rate for the workstations was computed by determining 
the number of floating-point operations in a run using a 
profiler on a CRAY-YMP. The number of floating-point 
operations is assumed to be the same for the worksta- 
tions, and is divided by the cpu time to get an overall 
MFLOP rate. 

Timings for the CRAY-YMP are included in ta- 
ble 1 to provide comparisons between supercomputer 
rates and workstation rates. All the timings reported 
here are for single-processor hours. The timings on 
the CRAY-YMP illustrate the benefits of using addi- 
tional memory to enhance vectorization. The overall 
cpu time of the code is decreased by a factor of 2.3 
if additional memory is used to perform several in- 
versions at once in the block tridiagonal solver. The 
vectorization in the block tridiagonal solver is the only 
difference between the scalar and vector versions of the 
code in this study. The scalar version of STAGE-2 runs 


at 2.0 MFLOPS on even the least expensive worksta- 
tion used. With 2 subiterations per time step, 500 time 
steps per cycle, and a 50,367-point grid, this translates 
to a turnaround time of 16 clock hours for one cycle 
on a dedicated low-end machine. For the fastest work- 
station used in this study, a cycle can be obtained in 
less than 2.6 clock hours. With the continuing rapid 
improvement in workstation technology, these timings 
will improve dramatically in the near future. 

One concern when implementing a computational 
fluid dynamics (CFD) code on a workstation is the 
effect of word length on the accuracy of the solution. 
The CRAY-class supercomputers have a 64-bit word 
length, and the workstations used in this study have a 
32-bit word length. To address this issue, a coarse-grid 
calculation using the experimental axial gap spacing 
was performed. Workstation-generated time-averaged 
surface pressures are compared with experimental data 
in figure 2. The time-averaged pressures are obtained 
by averaging the instantaneous static pressure over one 
cycle. The pressures are then nondimensionalized and 
plotted with respect to axial distance. The workstation 
results compare well with the experimental data and 
are nearly identical to the supercomputer results. This 
indicates that the 32-bit word length on the workstation 
is sufficient to generate accurate solutions. 

Time-averaged pressure contours are presented in 
figure 3 and standard deviation of pressure contours in 
figure 4 for the field around the second-stage rotor for 
three different axial gaps. In these figures, the pressure 
is averaged in the rotor frame of reference. The stan- 
dard deviation is also computed in the rotating frame of 
reference for the second-stage rotor. The standard de- 
viation of the pressure field at each point is computed 
as 

where » is the number of time steps in a cycle. Darker 
shades indicate higher pressures or higher levels of un- 
steadiness. The locus of points described by the the 
trailing edge of the first-stage stator and the leading 
edge of the second-stage stator are plotted as dashed 
lines. The time-averaged flow fields are qualitatively 
similar for the different axial gap cases. Contours of 
P' show that the greatest unsteadiness is near the lead- 
ing edge of the stator. This is most pronounced for the 
20%-axial-gap case (fig. 4(a)). 

Time-averaged pressure contours and standard de- 
viation of pressure contours are presented in figures 5 
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Table 1. STAGE-2 performance statistics 


Machine 

Memory 

epu/it/pt 

MFLOPS 

CRAY-YMF* 1 

128 Mwords 

81.4 fise c 

58 

CRAY-YMP* 

128 Mwords 

35.7 nscc 

131 

HP 9000/835 

16 Mbytes 

2.34 msec 

2.0 

HP 9000/720 

16 Mbytes 

0.37 msec 

12.9 

IBM 6000/530 

48 Mbytes 

0.50 msec 

9.6 

SGI 4D25 

16 Mbytes 

1.57 msec 

3.0 

SGI 4D210GTX 

32 Mbytes 

1.24 msec 

3.8 

SGI 4D320VGX 

48 Mbytes 

0.95 msec 

5.0 


a STAGE-2 scalar version. 
^STAGE-2 vector version. 


and 6, respectively, for the second-stage stator. The 
time-averaged pressures and standard deviation are 
computed in the stator frame of reference. The locus 
of points described by the trailing edge of the second- 
stage rotor is plotted in this figure as a dashed line. The 
time-averaged field pressures for the 20%, 35%, and 
50% cases are similar to one another. The 20% -gap 
case (fig. 6(a)) shows a higher level of unsteadiness 
near the second-stage rotor trailing-edge locus than in 
the rest of the field. The area immediately surrounding 
the leading edges of the second-stage stator are also 
more unsteady than the rest of the field, for each of 
the axial gaps. 

Figures 3-6 give a qualitative view of the steady 
and unsteady flow features in the second stage of the 
compressor. As can be surmised from these figures, 
the time-averaged surface pressures for the three axial 
gaps are similar to each other. They also closely re- 
semble those for the experimental gap configuration in 
figure 2, and hence are not reported here. The surface- 
pressure amplitudes do vary with axial gap; they are 
shown in figure 7 for each airfoil in the compressor. 
The pressure amplitudes are computed by determining 
the maximum and minimum pressure at each point on 
the surface over a cycle and then subtracting the min- 
imum pressure from the maximum pressure. As ex- 
pected, the 20%-gap case (fig. 7(a)) shows the greatest 
level of unsteadiness, and the 35%-gap case (fig. 7(b)) 
generally shows more unsteadiness than the 50%-gap 
case (fig. 7(c)). Because the airfoils are farther apart 
in the 35%- and 50%-gap cases, the effect of the po- 
tential fields is reduced. This reduces the overall level 
of unsteadiness of pressure in the compressor. 

Pressure-amplitude plots yield information regard- 
ing the level of unsteadiness in the compressor, but do 
not contain phase information. Force polar plots are 


used to investigate both the frequencies and the am- 
plitudes associated with the unsteadiness. In figures 8 
and 9, force polar plots are presented for the second 
stage of the compressor for all three axial gap cases. 
These plots are generated by integrating the instanta- 
neous surface-pressure field and resolving the resultant 
force into its axial and tangential components. The 
tangential force is then plotted against the axial force. 
For a periodic solution, this curve should close on it- 
self at the end of a cycle, and is a good measure of the 
convergence of a solution to a periodic state. Figure 7 
shows that the overall unsteadiness in the compressor 
increases as axial gap decreases. However, the inte- 
grated force field does not necessarily become more 
unsteady as the axial gap decreases. The force polar for 
the second-stage rotor at an axial gap of 20% (fig. 8(a)) 
shows more unsteadiness than either the 35%-axial-gap 
case (fig. 8(b)) or the 50%-axial-gap case. However, 
the integrated forces are more unsteady for the 50%- 
axial-gap case than for the 35%-axial-gap case. Ani- 
mations of these flows indicate that for the 50%-gap 
case, the second-stage rotor interacts with wakes that 
interacted with each other. This reduces the frequency 
with which the rotor passes through upstream wakes, 
but increases the amplitude of the force polar. For the 
35%-gap case, the wakes from upstream airfoils are 
encountered at different times, so the frequency of the 
force variation is higher, but the amplitude is reduced. 

A similar effect is seen in figure 9 for the forces 
on the second-stage stator. The amplitude of the forces 
is smallest for the 20%-gap case (fig. 9(a)) and largest 
for the 50%-gap case (fig. 9(c)). Note that the pas- 
sage of each individual wake can be seen in the force 
polar for the 35%-gap case. The IGV wake is seen 
as the smallest amplitude loop (on the left). The far- 
ther downstream the wake is generated, the larger the 
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amplitude of the loop. Despite the fact that the un- 
steadiness of the pressure field increases as the axial 
gap decreases, the actual force amplitude on the airfoil 
may decrease. 

CONCLUSIONS 

A third-order-accurate upwind-biased thin-layer 
Navier-Stokes zonal code (STAGE-2) was used to in- 
vestigate the flow within a multistage compressor. It 
was shown that STAGE-2 can be used to compute 
unsteady, multistage-compressor flows in a worksta- 
tion environment. The rapid development of worksta- 
tion technology has and will make possible the regular 
use of workstations as valuable sources of computa- 
tional time. In the future, STAGE-2 will be used in a 
networked workstation environment to investigate dis- 
tributed processing of unsteady turbomachinery flows. 
This will further increase the value of workstation net- 
works as a source of computational time. 

The effects of axial gap spacing on the unsteady 
flow within a 2 j -stage compressor were investigated. 
As the axial gap is reduced, the potential interaction 
between airfoils becomes more significant. However, 
the wake-interaction effects can vary with axial gap 
depending on the relative phase between the wakes. 
The force amplitude can be smaller even though the 
gap has been decreased. It is surmised that airfoil phase 
must be considered in estimating interaction effects. 
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Figure 2. Time-averaged surface pressures, experimental gap configuration. 










(c) 

Figure 5. Second-stage-stator time-average pressure 
contours, (a) 20% gap, (b) 35% gap, (c) 50% gap. 



(c) 

Figure 6. Second-stage-stator standard-deviation pres- 
sure contours, (a) 20% gap, (b) 35% gap, (c) 50% 
gap- 
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Figure 8. Second-stage-rotor force polars. (a) 20% gap, Figure 9. Second-stage-stator force polars. (a) 20% 
(b) 35% gap, (c) 50% gap. gap, (b) 35% gap, (c) 50% gap. 
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